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A METHOD AND SYSTEM FOR SELECTIVELY APPLYING 



ENHANCEMENT TO AN IMAGE 



FIELD OF THE INVENTION 

The invention relates generally to the field of digital image 
processing and, more particularly, to a method for determining the amount of 
enhancement applied to an image based on subject matter in the image. 

BACKGROUND OF THE INVENTION 

In general, image enhancement involves applying one or more 
operations to an image to improve the image quality, for example, sharpening 
improves image details, noise reduction removes image noise, de-blocking 
removes blocking artifacts caused, for example, by JPEG image compression, 
scene balance adjustment improves brightness and color balance, and tone-scale 
adjustment improves image contrast and rendering. 

While these methods do indeed produce enhanced images, the 
quality of the resulting image often varies depending on the image content. For 
example, using the unsharp mask algorithm may produce a pleasing result for an 
image of a building. However, using the same algorithm may result in the 
undesirable appearance of oversharpening for an image of a human face (e.g., 
wrinkles, blemishes may be unpleasantly "enhanced", i.e., made more visible). 
For another example, using a smoothing algorithm helps remove the amount of 
noise and/or blocking artifacts and produce a pleasing result for an image of a 
human face or clear blue sky. However, the same operation of the same amount 
may result in undesirable removal of details in grass lawn, textured fabric, or 
animal hair. Conventionally, the amount of sharpening, or any other type of 
enhancement, needs to be adjusted individually for each scene by a human 
operator, an expensive process. Another drawback of the conventional approach is 
that the amount of sharpening cannot be adjusted easily on a region by region basis 
within the same image, resulting in having to apply an amount of enhancement that 
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is a trade-off between different amounts required by different subject matters or 
objects in the scene. 

In the prior art, there are examples of modifying an image 
enhancement operation based on pixel color. For example, in US Patent 5,682,443 
5 issued October 28, 1997, Gouch et al. describe a method of modifying, on a pixel 
by pixel basis, the parameters associated with the unsharp mask. Sharpening an 
image with unsharp masking can be described with the following equation: 

s(x,y) = i(x,y)**b(x,y) + pf( i(x,y) - i(x,y)**b(x,y) ) (1) 

where: 

10 s(x,y) = output image with enhanced sharpness 

i(x,y) = original input image 
b(x,y) = lowpass filter 
P = unsharp mask scale factor 
f() = fringe function 
15 ** denotes two dimensional convolution 

(x,y) denotes the x-th row and the y-th column of an image 

Typically, an unsharp image is generated by convolution of the 
image with a lowpass filter (i.e., the unsharp image is given by i(x,y)**b(x,y)). 
Next, the highpass, or fringe data is generated by subtracting the unsharp image 
20 from the original image (the highpass data is given by i(x,y) - i(x,y)**b(x,y)). This 
highpass data is then modified by either a scale factor J3 or a fringe function f() or 
both. Finally, the modified highpass data is summed with either the original image 
or the unsharp image to produce a sharpened image. 

Gouch et al. teach that the fringe function may be dependent on the 
25 color of the pixel i(x,y) This feature allows them to tailor the sharpening 
preformed for those pixels which are similar in color to flesh, for example. 
However, this method is not based on a probability or degree of belief that specific 
image pixels represent human flesh, and thus likely unnecessarily conservatively 
sharpens image regions having a similar color to human flesh such as bricks or 
30 wood. The method of Gouch et al. exclusively uses image color and does not 
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allow for the use of other features such as texture or shape features which research 
has shown to effectively classify image regions. 

Schwartz discloses the concept of automatic image correction using 
pattern recognition techniques in Europe Patent Application 0681268, filed April 
5 10,1 995, wherein the pattern recognition sub-system detects the presence and 
location of color-significant objects. Another similar method of selective 
enhancement of image data was described by Cannata, et al. in USSN 09/728,365, 
filed November 30, 2000, published October 18, 2001. In one embodiment, spatial 
processing transformations such as sharpening are reduced or bypassed for pixels 

1 0 having color code values within a range known to be adversely affected by the 
spatial processing transformations. In another embodiment, color correction 
processing transformations are bypassed for pixels having color code values with a 
neutral color range. Clearly, since color is the only characteristic used to perform 
selective enhancement, undesirable effects can be obtained for other subject 

1 5 matters with s imil ar col ors . 

Therefore, there exists a need for determining the types and 
amounts of enhancement for a particular image, whereby the local quality (e.g., 
sharpness and color) of the image can be improved depending on detecting 
different objects or subject matters contained within the image. 

20 

SUMMARY OF THE INVENTION 

The need is met according to the present invention, by providing a 
method for processing a digital color image that includes providing a subject 
matter detector for distinguishing between target and background subject matters; 

25 applying the subject matter detector to the image to produce a belief map 

indicating the degree of belief that pixels in the image belong to target subject 
matter; providing an image enhancement operation that is responsive to a control 
signal for controlling the degree of image enhancement; and applying image 
enhancement to the digital image by varying the control signal according to the 

30 belief map to produce an enhanced image. 
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AD VANTAGES 

The present invention has the advantage that the local amount of 
image enhancement operation of an image can be varied depending on the detected 
subject matter within the image. Rather than tuning a system to apply image 
5 enhancement to all images or an entire given image at a conservative level, for fear 
of creating enhancement artifacts in some images or some image areas of a given 
image, the present invention automatically determines the amount of image 
enhancement for each region in an image based on the subject matter content of 
the region. Moreover, for certain regions of an image, the enhancement operation 
1 0 may not be performed at all (i.e., the amount of enhancement is zero) based on the 
strong belief that a particular target subject matter exists in the region. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a flowchart illustrating the present invention; 
1 5 Fig. 2 is an example of a natural image; 

Fig. 3 is an example belief map generated by the subject matter 
detector 4 when the target subject matter is human flesh with belief values 
indicated for the different subject matters of the image; 

Fig. 4 is an example belief map generated by the subject matter 
20 detector 4 when the target subject matter is sky with belief values indicated for the 
different subject matters of the image; 

Fig. 5 is an example belief map generated by the subject matter 
detector 4 when the target subject matter is grass with belief values indicated for 
the different subject matters of the image; 
25 Fig. 6 is an example belief map generated by the subject matter 

detector 4 when the target subject matter is cloud and snow with belief values 
indicated for the different subject matters of the image, 

Fig. 7 is an example belief map generated by the subject matter 
detector 4 when the target subject matter is water with belief values indicated for 
30 the different subject matters of the image; 
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Fig. 8 is a block diagram illustrating the generation of an 
enhancement control signal according to the present invention; 

Fig. 9 is a block diagram showing the method for detecting skin- 
tone regions; and 

5 Fig. 1 0 is a block diagram showing the method for detecting subject 

matter regions such as sky or grass. 

DETAILED DESCRIPTION OF THE INVENTION 

O In the following description, the present invention will be described 

jjjii 10 as a method implemented as a software program. Those skilled in the art will 

% readily recognize that the equivalent of such software may also be constructed in 

M : hardware. Because image enhancement algorithms and methods are well known, 

L the present description will be directed in particular to algorithm and method steps 

• * forming part of, or cooperating more directly with, the method in accordance with 

O 1 5 the present invention. Other parts of such algorithms and methods, and hardware 

r: and/or software for producing and otherwise processing the image signals, not 

specifically shown or described herein may be selected from such subject matters, 
components, and elements known in the art. Given the description as set forth in 
the following specification, all software implementation thereof is conventional 
20 and within the ordinary skill in such arts. 

Fig. 1 illustrates the preferred embodiment of the present invention 
for processing an image with a specific image processing path in order to obtain an 
enhanced output image. In general, the present invention performs an enhancement 
operation to an image, the amount of enhancement being determined by the subject 
25 matter(s) or objects within the image. Thus, the amount of enhancement applied to 
individual images, or individual regions within a particular image, may vary 
depending on the image content. The amount of enhancement applied to any 
particular image or any particular region in an image is selected to be appropriate 
for the specific image content. 
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In a preferred embodiment of the present invention, an image i(x,y) 
having x 0 rows and y 0 columns is first reduced to a low resolution version in order 
to decrease the processing time required by the present invention to detect 
interesting subject matters and the corresponding control signal for image 
5 enhancement. Preferably the image i(x,y) is of high resolution, for example x a = 
1 024 rows of pixels by y Q = 1 536 columns of pixels. The low resolution image has 
m 0 rows and n Q columns, preferably m 0 = 256 and n Q = 384 pixels. Many methods 
of creating small image from larger images are known in the art of image 
processing and can be used (see textbook: Gonzalez et al, Digital Image 

10 Processing, Addison- Wesley, 1992). Alternatively, this step may be omitted. 

Referring to Fig. 1 , there is shown a block diagram of the present 
invention. A digital color image 10 is obtained. Next, a subject matter detector is 
selected 20 from a collection of subject matter detectors 22 and then applied 24 to 
the digital image. The result is a subject matter belief map 30, indicating the belief 

1 5 that particular pixels or regions of pixels belong to a given target subject matter. 
The target subject matter is a subject matter to which a particular image 
enhancement operation is sensitive. Alternatively, a collection of target subject 
matters can be pre-determined. The subject matter detector selected in step 20 
outputs a subject matter belief map 30 M(m,n), preferably having the same pixel 

20 dimensions in terms of rows and columns as the image input to the subject matter 
detector. If the input digital image is reduced in size and leads to a belief map that 
does not match the size of the original input image, the belief map can be 
interpolated accordingly. The belief map indicates the belief that particular pixels 
represent the target subject matter. The belief is preferably represented as a 

25 probability. For example, each pixel value M(m,n) is equal to 100 * P( pixel (m,n) 
of the low resolution image represents the target subject matter ), where ?(A) 
represents the probability of event A. Alternatively, each pixel value M(m,n) may 
represent a binary classification indicating belief (after the belief values are 
thresholded by a pre-determined value). For instance, a pixel value of 1 in the 

30 belief map may represent belief that the pixel represents the target subject matter 



-7- 



with the highest probability and a pixel value of 0 may represent the belief that the 
pixel does not represent the target subject matter with the highest probability. In 
another alternative, the belief may be represented by a distance, for example a 
distance in a feature space where generally higher distances correspond to lower 
belief values and lower distances correspond to higher belief values. 

In the preferred embodiment, one of the target subject matters is 
human flesh. USSN 09/904,366 filed July 21, 2001 Dupin et al. describes a 
method of creating a belief map indicating the belief for a target subject matter of 
flesh. 

Referring to Fig. 9, one method that can be used with the present 
invention for implementing a subject matter detector 22 is illustrated. First, an 
input digital image is obtained using an obtain input digital image step 951 . This 
input digital image is then processed using an apply scene balance algorithm step 
952 to obtain an estimate of the appropriate color balance of the image using a 
conventional scene balance algorithm. Next, a produce initial scene balanced 
digital image step 953 is used to apply any necessary corrections to the input 
digital image. The scene-balanced digital image is then processed by an obtain 
probability values for skin-tone colored pixels step 954. An apply predetermined 
threshold step 955 is then applied to the probability values, followed by an obtain 
regions of skin tone colors step 956. 

One method that can be used for the obtain regions of skin-tone 
colored pixels step 954 is more completely described in the following. The pixel 
RGB values are converted to "Lst" coordinates by the following equations: 



25 L = (R+G+B)/sqrt(3) 

s = (R-B)/sqrt(2) 
t = (2G-R-B)/sqrt(6) 



30 



For each pixel in the input color digital image, the probability that it 
is a skin-tone pixel is computed. The probability is derived from its coordinates in 



the Lst space, based on predetermined skin-tone probability functions. These 
probability functions were constructed based on collection of data for the color- 
space distributions of skin and non-skin regions in a large collection of scene 
balanced images. The conditional probability that a pixel is a skin-tone pixel given 
5 its Lst coordinates is: 

Pr(Skin|L,s,t) = Pr(Skin|L)*Pr(Skin|s)*Pr(Skin|t) 

where each of the conditional distributions Pr(Skin|L), Pr(Skin|s), and Pr(Skin|t) 

1 0 are constructed by application of Bayes Theorem to the original training 

distributions for skin and non-skin pixels. In comparison, other conventional 
methods for detecting skin-tone colored pixels (see US Patent 4,203,671 issued 
May 20, 1980 to Takahashi et al., and US Patent 5,781,276 issued July 14, 1998 to 
Zahn et al.) use the likelihood probability of P(color|Skin). One drawback of using 

1 5 the conventional likelihood probability is that the probability distribution of non 
skin-tone pixels is not accounted for. Consequently, there is a higher likelihood 
for false detection. 

The collection of probabilities for all pixels forms a skin-tone 
probability distribution for the input image. The skin-tone probability distribution 

20 is thresholded to create a binary map such that each pixel is designated as either 
skin-tone or non skin-tone. Alternatively, a face detection algorithm can be used 
to find human face regions in the input color digital image. Regions of skin-tone 
colors are then extracted from the detected face regions. For a description of a face 
detection method, see US Patent 5,710,833 issued January 20, 1998 to 

25 Moghaddam et al. 

Additionally, methods of creating belief maps for a target subject 
matter of human flesh are described in the following articles: Cho et al., Adaptive 
Skin-Color Filter, Pattern Recognition, 34 (2001) pp. 1067-1073; and Fleck et al. 
Finding Naked People, Proceedings of the European Conference on Computer 

30 Vision, Vol. 2, 1996, pp. 592-602. 
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Alternatively, the target subject matter may be human faces, sky, 
lawn grass, snow, water, or any other subject matter for which an automated 
method exists for determining subject matter belief from an image. Human face 
detection is described in many articles. For example, Heisele et al., Face 
5 Detection in Still Gray Images, MIT Artificial Intelligence Lab, A. I. Memo 

1687, C.B.C.L Paper No. 187, May 2000. USSN 09/450,190 filed November 29, 
1999 by Luo et al. describes the creation of belief maps when the target subject 
matter is blue sky. 

The major drawback of conventional techniques for subject matter 

1 0 detection is that they cannot identify primary subject matters reliably because of 
the lack of consideration of unique characteristics of the subject matters. For 
example, Saber et al., Automatic Image Annotation Using Adaptive Color 
Classification, Graphical Models and Image Processing, Vol. 58, No. 2, March 
1996, pp. 1 15-126, uses color classification only to detect sky. Consequently and 

1 5 frequently, many types of sky-colored subject matters, such as clothing, man-made 
object surface, and water, are mistaken as sky. Furthermore, some of these 
techniques have to rely on the a priori knowledge of the image orientation. Failure 
to reliably detect the presence of primary subject matters, in particular false 
positive detection, may lead to failures in the downstream applications (e.g., 

20 falsely detected sky regions may lead to incorrect inference of image orientation). 
Therefore, there is a need for a more robust primary subject detection method. 

The need is met by providing a method for detecting subject matter 
regions in a digital color image having pixels of (red, green, blue) values. This is 
accomplished by the steps of assigning to each pixel a belief value as belonging to 

25 the subject matter region based on color and texture features, forming spatially 
contiguous candidate subject matter regions by thresholding the belief values, 
analyzing the spatially contiguous regions based on one or more unique 
characteristics of the subject matter to determine the probability that a region 
belongs to the subject matter, and generating a map of detected subject matter 

30 regions and associated probability that the regions belong to the subject matter. 
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Fig. 10 shows a method according to the present invention for 
detecting subject matter such as clear blue sky or lawn grass. First, an input color 
digital image is obtained 1011 . Next, each pixel is assigned 1021 a subject matter 
belief value in a color and texture pixel classification step based on color and 
5 texture features by a suitably trained multi-layer neural network. Next, a region 
extraction step 1023 is used to generate a number of candidate subject matter 
regions. At the same time, the input image is processed by an open-space 
detection step 1022 to generate an open-space map 1024 (as described in US 
Patent 5,901,245 issued May 4, 1999 to Warnick et al. incorporated herein by 

10 reference). Only candidate regions with significant (e.g., greater than 80%) 

overlap with any region in the open-space map are "smooth" and will be selected 
1026 for further processing. These selected candidate regions are analyzed 1028 
for unique characteristics. In the case of blue sky, this unique characteristic is a 
de-saturation effect, i.e., the degree of blueness decreases gradually towards the 

1 5 horizon. This unique characteristic of blue sky is used to differentiate the true blue 
sky regions from other blue-colored subject matters. In particular, a 2 nd order 
polynomial is used to fit a given smooth, sky- colored candidate region in red, 
green, and blue channels, respectively. The coefficients of the polynomial are 
classified by a trained neural network to decide whether a candidate region fits the 

20 unique characteristic of blue sky. Only those candidate regions that exhibit these 
unique characteristics are labeled as smooth blue sky regions. In the case of lawn 
grass, the unique characteristics are a light and isotropic texture, specific location 
(near bottom of the images and in contact with image borders). Only those 
candidate regions that exhibit these unique characteristics are labeled as smooth 

25 lawn grass regions. A subject matter belief map indicating the location and extent, 
as well as the associated belief values of detected subject matter regions is 
generated 1030 as a result of the analysis. Other general subject matter detection 
(e.g. snow and water) can be performed using a similar approach. 

Returning to Fig. 1, multiple subject matter detectors are selected 

30 20, either in parallel or in series, to produce a set of belief maps M ; (x,y) where i 
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ranges from 1 to N. In an alternative case, one of the subject matter detectors may 
be used to produce a belief map M 0 (x,y) that corresponds to the belief that a given 
pixel does not represent the any of the target subject matters considered by any of 
the selected subject matter detectors. Such a belief map would contain high (e.g. 
5 probabilities near 1 .0) values only at pixel positions where all of the belief maps 
Mi(x,y) contain low (e.g. probabilities near 0.0) values. In essence, the M 0 (x,y) 
belief map indicates the belief that a given pixel location will be considered as 
background by all the selected subject matter detectors. 

Continuing in Fig. 1, an image enhancement operation is selected 

10 40 from a collection of pre-determined image enhancement operations 44. Based 
on the selected image enhancement operation, an enhancement control signal 50 is 
generated for controlling the amount of selected enhancement appropriate for the 
target subject matter and the background subject matters. The selected image 
enhancement according to the controlling signal is applied 60 to the input digital 

1 5 image 10 to produce an enhanced image 70 . 

Various types of target subject matter can be detected and various 
image operations can be performed according to the present invention. For 
example, the present invention can be used to conservatively sharpen image 
regions in which human flesh is detected and aggressively sharpen image regions 

20 in which human flesh is not detected; conservatively sharpen image regions in 

which clear sky is detected and aggressively sharpen image regions in which clear 
sky is not detected. Multiple target subject matters can also be used as a criterion 
for selective image enhancement. For example, the present invention can be used 
to conservatively sharpen image regions in which human flesh or clear sky is 

25 detected and aggressively sharpen image regions in which human flesh or clear sky 
is not detected. 

Alternatively or additionally, the present invention can be used to 
conservatively remove noise in image regions in which grass lawn is detected, and 
aggressively remove noise image in image regions in which grass lawn is not 
30 detected; aggressively remove noise in image regions in which human flesh is 
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detected, and conservatively remove noise in image regions in which human flesh 
is not detected; conservatively remove noise in image regions in which bodies of 
water are detected, and aggressively remove noise image in image regions in 
which bodies of water are not detected; 
5 Still alternatively or additionally, the present invention can be used 

to aggressively de-block image regions in which human flesh is detected and 
conservatively de-block image regions in which human flesh is not detected; 
aggressively de-block image regions in which clear blue sky is detected and 
conservatively de-block image regions in which clear blue sky is not detected. 

1 o Still alternatively or additionally, the present invention can be used 

to selectively modify the colors (color re-mapping) of certain target subject 
matters. For example, the color of sky regions in an image can be modified to a 
pre-determined "preferred" or "memory" color of the sky; the color of grass 
regions in an image can be modified to a pre-determined "preferred" or "memory" 

1 5 color of the lawn grass (e.g., that of a golf course); the color of skin regions in an 
image can be modified in the direction of a pre-determined "preferred" or 
"memory" color of the skin (e.g., tanned skin). Color re-mapping can include 
modification of hue, lightness, or saturation of a color. Increasing the saturation of 
a subject matter is an example of color remapping. 

20 Still alternatively or additionally, the present invention can be used 

to selectively adjust the scene balance (brightness and color balance) based on the 
color and brightness of certain target subject matters. For example, the color and 
brightness of the snow region can be used to determine the overall scene balance 
or local scene balance if there are multiple light sources; the color and brightness 

25 of the skin (face) region can also be used to determine the overall scene balance or 
local scene balance if there are multiple light sources. 

Still alternatively or additionally, the present invention can be used 
to selectively adjust the tone scale of certain target subject matters. For example, 
the color gradient of sky regions in an image can be modified to a pre-determined 

30 "preferred" or "memory" tone scale (color gradient) of the sky; the contrast and 
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tone scale of face regions in an image can be modified in the direction of a pre- 
determined "preferred" or "memory" tone scale of faces (e.g., that of studio 
portraits). 

Still alternatively or additionally, the present invention can be used 
to selectively interpolate an image according to a target subject matter. Image 
interpolation refers to the process of creating a larger, magnified version of an 
existing image. Typically, bilinear interpolation or cubic spline-based 
interpolation is used. It is desirable to apply different interpolation techniques in 
different subject matter areas. In general, planar regions of an image, e.g., blue 
sky, may be well served by an interpolation algorithm that implements a planar 
model, such as the well-known bilinear interpolation. Other target subject matters, 
e.g., grass lawn, foliage, or bodies of rippled water, may not be well served by a 
planar model and may require a more complicated model, such as a fractal-based 
model described by US Patent 6,141,017; issued October 31, 2000 to Cubillo et al. 

In processing a digital image, it is well known to sharpen the image 
and enhance fine detail with sharpening algorithms. Typically, this sharpening is 
performed by a convolution process. For example, see Jain's textbook, 
Fundamentals of Digital Image Processing, published in 1989 by Prentice-Hall, 
pp. 249-250. The process of unsharp masking is an example of a convolution 
based sharpening. 

For example, sharpening an image with an unsharp mask can be 
described with the following equation: 

s(x,y) = i(x,y)**b(m,n) + {3(x,y)f( i(x,y) - i(x,y)**b(m,n) ) (1) 
where 

s(x,y) - output image with enhanced sharpness 
i(x,y) = original input image 
b(m,n) = lowpass convolution filter 
P(x, y) = control signal factor 
f(x,y) = fringe function 
** denotes two dimensional convolution 
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(x,y) denotes the x th row and the y th column of an image 
(m,n) denotes the m th row and the ri h column of the convolution filter 
Typically, an unsharp image is generated by convolution of the 
image with a lowpass filter (i.e., the unsharp image is given by i(x,y) **b(m,n)). 
5 Next, the highpass signal is generated by subtracting the unsharp image from the 
original image (the highpass signal is given by i(x,y) - i(x,y)**b(x,y)). This 
highpass signal is then modified by a fringe function f(x,y) and control signal 
function P(x, v) which acts as an unsharp mask scale factor. Note that f() is 
preferably an identity operation (i.e., does nothing). Finally, the modified highpass 
10 signal is added to either the original image or the unsharp image to produce a 

sharpened image. In a preferred embodiment of the present invention, the control 
signal is represented by the collection of unsharp mask scale factors fi(x,y). The 
value of the control signal fi(x,y) at any particular location (x,y) is related to the 
value of various belief maps M(x,y) at the corresponding image locations. 
1 5 Assuming that the size (in lines and columns) of the belief map is identical to the 
size of the image, the preferred relationship between the control signal P(x,y) and 
the belief maps M(x,y) is given by the equation: 



where / represents the index of the subject matter detector. For example, Mj{x,y) 
may be a belief map representing the belief of human flesh, M2(x,y) may be a 
belief map representing belief of blue sky, Ms(x,y) may be a belief map 
representing the belief of grass, etc. 



in the associated target subject matter. Continuing the above example, Tj = 1.5 for 
human flesh, T 2 = 1.5 for blue sky, T 3 = 6.0 for green grass, etc. 



X(Mi(x,y)(Ti-To)) 




20 



25 



T, represents the control signal target for a pixel having high belief 
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Tq represents the control signal target for a pixel that is generally 
considered to be background ("pure" background) by all the subject matter 
detectors. 

This embodiment does not assume that the subject matter detectors 
5 are mutually exclusive. In other words, the value J] [M t (x, y)) may be greater 

than 1 .0 for some pixel position (x,y). 
y ; In an alternative embodiment, 

fa p(x,y)-x( M i( x -y)^) + ^ 1 - 0 -Z( M i( x 'y))j ,I b 

ffi 

k This example also assumes that the subject matter detectors are 

E 10 mutually exclusive. In other words, the value ^(M.(x,yj) is not greater than 1.0 

M; at any pixel position (x,y). 

jj Those skilled in the art of image processing will recognize that 

many other equations may be used to relate the control signal to the collection of 

f I belief maps Mix, y) . For example, in the further alternative cas e where the belief 

1 5 map M 0 (x,y) indicates the belief that a given pixel does NOT represent the any of 
the target subject matters considered by any of the selected subject matter 
detectors, the following relationship may be implemented: 



P(x,y) = ^ (M. {x, y)T. )+M 0 (x, y)T 0 

20 

Although a similar sharpening effect can be achieved by 
modification of the image in the frequency domain (for example, the FFT domain) 
as is well known in the art of digital signal processing, it is in general more 
difficult to perform spatial varying sharpening in the frequency domain. In other 
25 words, frequency domain techniques are in general not suitable for use with the 
present invention. 
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In the previous example, the value of the control signal at a 
particular location was shown to be dependent upon only the corresponding values 
of one or more belief maps Mi(x,y). Alternatively as shown in Fig. 8, the control 
signal is dependent on other features that can be derived from the belief map from 
an analysis performed by a belief map analyzer 32 . The belief map analyzer 32 
inputs the belief map and can output various signals related to the belief map. The 
signals output by the belief map analyzer may be scalars such as the mean, 
minimum, maximum, or variance of the belief map. Alternatively or additionally, 
the signals output from the belief map analyzer can be other maps, preferably 
having the same dimensions as the belief map(s). For example, the belief map 
analyzer 32 may produce a location map Lj(x,y) indicating the distance from each 
pixel location to the center of the image or some other appropriate distance. 
Additionally, the map analyzer 32 may produce a size map Sj(x,y) in which each 
pixel's value indicates the size of the region to which it is associated. By using a 
connected component algorithm such as is well known in the art of image 
processing, the size of each belief region having non-zero belief may be extracted 
from the belief map. The size may be determined by counting the number of 
pixels (or the percentage of total pixels) that belong to each belief region. The 
signals output from the belief map analyzer are then input to the enhancement 
control signal generator 34 in addition to the belief map. The enhancement control 
signal generator 34 then creates an enhancement control signal in which the 
control value at each location is dependent on the corresponding belief map 
Mj(x,y) values, the location map Lj(x,y), and the size map Si(x,y). 

The belief map analyzer 32 allows for added flexibility in tuning the 
selected image enhancement operation for the target subject matter. Consider 
image sharpening again as an example. In general, image pixels representing 
human flesh ideally should receive less sharpening. However, the sharpening may 
be further refined by the realization that small areas of human flesh (e.g., small 
faces on the image) can tolerate higher levels of sharpening without the appearance 
of visually unpleasant blemishes or wrinkles than larger areas of human flesh. The 
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size map Sfay) allows the enhancement control signal generator 34 to adapt the 
control signal based on the belief value and the value within the size map. 
Consequently, small faces are sharpened less than large faces, both of which are 
sharpened less than non-face regions. 
5 For example, the following relationship may be implemented by the 

enhancement control signal generator 34 to generate the control signal: 

P(x, y) = Z k(Si (x, y))Mi (x, y)Ti + T 0 

1 0 where k() is a function that controls the effect of the region size on the sharpening 
parameter. The value of k() varies monotonically from 0 for small values of 
Si(x,y) to 1 .0 for large values of Si(x,y). 

Referring to Fig. 2, there is shown a typical snapshot image 
containing a person (whose face is marked as skin region 100), foliage 101, cloud 

1 5 sky 102, clear blue sky 103, and lawn grass 104. Such an image may also contain 
snow field (not shown) and a body of water (not shown). Figs. 3-8 show the 
associated belief maps, when the target subject matter is human flesh, sky, lawn 
grass, snow field, water body, respectively. A null belief map contains only pixels 
of zero belief values and indicates that the corresponding subject matter is not 

20 present in the image (e.g. , no water region can be detected in Fig. 7). Note that a 
given belief map only relates to one type of subject matter. In other words, sky 
region is indicated as background region in the human skin belief map, and vice 
versa. The "pure" background regions 110 of the image are made up of the pixels 
having a belief of zero that the corresponding pixel represents any of the target 

25 subject matters. It is also possible that a pixel or a region has non-zero belief 

values in more than one belief maps, indicating that this pixel may belong to more 
than one type of subject matters (e.g., region 101 in Fig. 2 may have a probability 
of 0.3 belonging to grass and a probability of 0.7 belonging to green foliage). Fig. 
3 shows an example of a belief map generated by the subject matter detector when 
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the target subject matter is human flesh with belief values indicated for the 
different subject matters of the image. For example, the face region has a belief 
value of 95 on a scale of 0 to 100 for human skin, the tree trunk has a belief value 
of 30 and the background region has a belief value of 0. Fig. 4 shows an example 
5 of a belief map generated by the subj ect matter detector when the target subj ect 
matter is sky having a belief value of 100. Fig. 5 shows an example of a belief 
map generated by the subject matter detector when the target subject matter is 
grass with the grass having a belief value of 1 00 and the tree leaves having a belief 
value of 30. Fig. 6 shows an example of a belief map generated by the subject 
1 0 matter detector when the target subject matter is cloud and snow with the cloud 
having a belief value of 80. Fig. 7 is shows an example of a belief map generated 
by the subject matter detector when the target subject matter is water when there is 
no open water in the scene. 

Returning to Fig. 1, alternatively or additionally, the image 
1 5 enhancement operation may be noise reduction, de-blocking, scene balance 

adjustment, tone scale adjustment, color re-mapping, image interpolation, or any 
other operations with which one or more attributes of an image can be enhanced. 

JPEG de-blocking refers to removing visually objectionable block 
artificial boundaries as a result of JPEG image compression (see Luo et al. 
20 Artifact Reduction in Low Bit Rate DCT-Based Image Compression, IEEE 
Transactions on Image Processing, Vol. 5, No. 9, September 1996, pp. 1363- 
1368). In general, it is desirable to aggressively remove the blocking artifacts in 
human faces and smooth gradient of clear blue sky, where the blocking artifacts 
are most visible and objectionable. However, de-blocking algorithms usually tend 
25 to remove image details in such areas as lawn grass, snow fields, and bodies of 
water, where the blocking artifacts are least visible. Therefore, it is advantageous 
to apply different amounts of de-blocking to areas of different subject matters. 
Furthermore, it may be desirable to apply no de-blocking at all to subject matter 
regions of high texture content because such regions naturally hide blocking 
30 artifacts but do not tolerate loss of details. 
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While in general the information loss due to the quantization in the 
JPEG compression process may be impractical to recover, some coding artifacts 
can be alleviated through the incorporation of image smoothness constraints using 
an appropriate image prior model. 
5 There are several convex potential functions that have been used to 

enforce image smoothness constraint. Convex functions are often desired because 
the convergence of a convex constrained problem is always guaranteed if solutions 
do exist, and it can also be optimized efficiently due to fast convergence. More 
importantly, with the variable being the difference between neighboring pixels, 
10 convex potential functions with smooth transition, i.e., with good continuity 
properties, result in desired continuity in the image. 

A specific Gibbs random field called the Huber-Markov random 
field (HMRF) is preferred. Its potential function Vc,T(x) is in the form of 



t(*) = 



[T 2 +2T|x|-T), 



|x|<T 
|x|>T 



where T is a threshold. If we define the gray level differences between the current 
pixel Xm,n and the pixels within its neighborhood Nm,n as: 

20 fcn.n-*k,ij gNaifn . 

then, these differences can substitute for the variable of the Huber minimax 
function. The nice properties of the Huber minimax function is its ability to 
smooth certain types of artifacts while still preserving the image detail, such as 
edges and regions of textures. The quadratic segment of the function imposes least 
25 mean square smoothing of the artifacts when the local variation is below the 
threshold T. On the other hand, the linear segment of the function enables the 
preservation of image details by allowing large discontinuities in the image with a 
much lighter penalty. 
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The switching capability of the HMRF model is very important in 
distinguishing discontinuities of different nature. However, this switching 
property is still inadequate when we need to distinguish image details from the 
artifacts. Without semantics, a single value of the threshold T cannot accurately 
5 describe all the discontinuities, and is not sufficient to differentiate true image 
edges from artifacts. With knowledge of detected subject matters, one can select 
proper threshold T values for different subject matters. For example, a larger 
value of the threshold T (T 1=10) is chosen in the HMRF model for those pixels in 
the regions corresponding to skin and blue sky to smooth the artifacts, a moderate 

Z 1 0 value of the threshold T (T2=5) is applied to the regions corresponding to lawn 

grass and rippled water, and a zero value of the threshold T (T=0) is applied to the 

P highly textured regions such as foliage. 

* ; De-noising refers to noise reduction in images. See for example a 

W nonlinear filter described by Lee in Digital Image Smoothing and the Sigma 

p 15 Filter, Computer Vision, Graphics, Image Processing, Vol. 24, pp. 189-198, April 

1 983 . A nonlinear filter such as the a-filter has the advantage of better preserving 
image details when removing noise than linear filters. Local average of 
neighboring pixel values that are within a difference of a of the current pixel value 
is used to replace the current pixel value. Clearly, edges of large magnitude are 
20 preserved this way while noise of low magnitude are removed. 

The subject matter of the present invention relates to digital image 
understanding technology, which is understood to mean technology that digitally 
processes a digital image to recognize and thereby assign useful meaning to human 
understandable objects, attributes or conditions and then to utilize the results 
25 obtained in the further processing of the digital image. 

The present invention may be implemented for example in a 
computer program product. A computer program product may include one or 
more storage media, for example, magnetic storage media such as magnetic disk 
(such as a floppy disk) or magnetic tape; optical storage media such as optical 
30 disk, optical tape, or machine readable bar code, solid-state electronic storage 
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devices such as random access memory (RAM), or read-only memory (ROM); or 
any other physical device or media employed to store a computer program having 
instructions for controlling one or more computers to practice the method 
according to the present invention. 

The present invention has been described with reference to a 
preferred embodiment. Changes may be made to the preferred embodiment 
without deviating from the scope of the present invention. 
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